home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-03-10 | 21.6 KB | 682 lines | [TEXT/MPS ] |
-
- PCCTS 1.30 Release Notes
-
- Active Authors for 1.30 Release:
-
- Terence Parr
- Parr Research Corporation
- 5517 Pleasant Ave
- Minneapolis, MN 55419
- parrt@acm.org
-
- Russell W. Quong
- School of Electrical Engineering
- Purdue University
- W. Lafayette, IN 47907
- quong@ecn.purdue.edu
-
- Other authors:
-
- Will Cohen, cohenw@ecn.purdue.edu
- Hank Dietz, hankd@ecn.purdue.edu
-
- November 1, 1994
-
-
- This document describes the 1.30 release of the Purdue Compiler
- Construction Tool Set (PCCTS). This file describes the changes
- made to PCCTS 1.23 to arrive at PCCTS 1.30. A number of major new
- features were added and a few bug fixes were made. We imagine
- that this will be the last major feature addition to the ANTLR
- description language before a 2.00 version can be built. The
- original 1.00 manual and all release notes are required for a
- complete documentation set for PCCTS. A book is in the works and
- the papers provided at the ftp site don't hurt.
-
- PCCTS is in the public-domain and can be obtained at
- everest.ee.umn.edu in pub/pccts/1.30. The newsgroup
- comp.compilers.tools.pccts provides a discussion forum. To
- receive future release broadcast messages, register yourself by
- sending email to pccts@ecn.purdue.edu with a ``Subject:'' line of
- ``register''.
-
- The authors make no claims that this software will do what you
- want, that this manual is any good, or that the software actually
- works---use PCCTS at your own risk. Bug reports and/or cheery
- reports of its usefulness are very welcome, however.
-
- The maintenance and support of all PCCTS tools is primarily
- provided by Parr Research Corporation, Minneapolis MN. Please see
- file PCCTS.FUTURE for more information. All PCCTS tools currently
- in the public domain will continue to be in the public domain.
-
-
- I. NEW FEATURES
-
- The only new features are element labels and parser exception
- handling, but they significantly increase the capabilities of ANTLR.
- The programmer now has complete control over how errors (semantic and
- syntactic) are handled. Further, the element labels are generally
- useful and should be considered the preferred method for accessing
- lexical information about tokens. [Please note that parser exception
- handling is to be considered in alpha-release state].
-
- A. ELEMENT LABELING
-
- Actions in an ANTLR grammar may access attributes via labels of the
- form ``$label'' attached to token and rule references rather than the
- conventional ``$i'' for some integer i. By using symbols instead of
- integer identifiers, grammars are more readable and action parameters
- are not sensitive to changes in rule element positions.
-
- The form of a label is:
-
- label:element
-
- where element is either a token reference or a rule reference. To
- refer to the attribute (C interface) or token pointer (C++ interface)
- of that element in an action, use
-
- $label
-
- within an action or rule argument list. For example,
-
- a : t:ID <<printf("%s\n", $t->getText());>>
- ;
-
- [using the C++ interface]. To reference the tree variable associated
- with element, use
-
- #label
-
- When using parser exception handling, simply reference ``label'' to
- attach a handler to a particular rule reference. For example,
-
- a : t:b
- exception[t]
- default : <<trap any error found during call to 'b'>>
- ;
-
- Labels must be unique per rule as they have rule scope. Labels may be
- accessed from parser exception handlers.
-
- B. PARSER EXCEPTION HANDLING
-
- ***** CONSIDERED ALPHA QUALITY STUFF *****
-
- ANTLR has two mechanisms for error reporting and recovery. In the
- first mechanism, ANTLR automatically generates error messages using a
- simple, effective heuristic that is sufficient for many applications.
- However, when more sophisticated error handling is required, say for
- commercial-quality software, ANTLR now supports a second mechanism
- called ``parser exception handling'' that provides the flexibility of
- hand-built reporting and recovery in a convenient framework. Parser
- exception handling has much in common with C++ exception handling; we
- do not actually use C++ exceptions in our implementation and, hence,
- parser exception handling can be used with either the ANTLR C or C++
- interface.
-
- Parser exception handling provides a unified framework for reporting
- and recovering from semantic and syntactic errors; note that automatic
- mechanisms typically do not consider semantic errors. Parser
- exception handling provides nearly the flexibility of a hand-built
- parser [We have not played with the semantic recovery yet, but will
- soon--you cannot define your own error signals at this point].
-
- We illustrate the use of parser exception handlers by demonstrating
- how they are used to generate a better error message than the
- automatic mechanism, which simply reports where the error was detected
- and what was expected. Consider matching the rule stat using the
- following grammar fragment,
-
- stat: "if" expr "then" stat { "else" stat }
- | "while" expr "do" stat
- | VAR ":=" expr ";"
- | "begin" ( stat )+ "end"
- ;
-
- expr: atom ( "\+" atom )*
- ;
-
- atom: INT
- | FLOAT
- ;
-
- where INT, FLOAT, and VAR are defined as integer, float and identifier
- tokens. Upon input
-
- if 34+ then i:=1;
-
- the automatic error mechanism would report
-
- line 1: syntax error at "then" missing { INT FLOAT }.
-
- However, because we know the context in which the expr production was
- attempted, an improved error message could be generated; i.e., it
- could indicate the expression was both in an if-statement and that it
- was a conditional--as opposed to the right-hand-side of an assignment
- statement, for example. A better message would be
-
- line 1: if-statement: malformed conditional at "then"
-
- One way to achieve this error message is to modify the original stat
- grammar as follows
-
- stat: "if" e:expr "then" stat { "else" stat }
- exception[e]
- catch MismatchedToken :
- catch NoViableAlt :
- <<
- fprintf(stderr,
- "line %d: if-statement: malformed conditional at \"%s\"\n",
- zzline, LATEXT(1));
- zzconsumeUntilToken(THEN);
- >>
- | "while" expr "do" stat
- ...
- ;
- }
-
- where THEN is the token type associated with "then"; MismatchedToken
- and NoViableAlt are predefined error signals. The notation ``e:expr''
- attaches the label e to the expr rule reference. Labels allow the
- exception handler to catch errors encountered specifically for input
- matched during that reference.
-
- Good error handling requires programmer intervention. Automatic
- mechanisms typically do not perform well, because they cannot easily
- analyze the state of the parser (e.g., the symbol stack of a
- table-driven parser or the program counter of a recursive-descent
- parser). Knowing where to report errors and how to recover from them
- must be done with a programmer's experience. While more programming
- effort is required than for automatic mechanisms, ANTLR's parser
- exception handling provides a convenient, sophisticated mechanism that
- rivals the flexibility of hand-coded schemes.
-
- ANTLR defaults to using the old, automatic mechanism. Any use of the
- keyword "exception" triggers the new mechanism.
-
- We now provide specific ANTLR programming details for parser exception
- handling.
-
- Syntax
-
- A group of exception handlers may be specified after any alternative,
- after the ";" of a rule definition, and default/global exceptions may
- be specified before the list of rules. The syntax for an exception
- group is as follows:
-
- exception_group
- : "exception" { Label_in_brackets }
- ( exception_handler )*
- { "default" ":" Action }
- ;
-
- exception_handler
- : "catch" SIGNAL ":" { Action }
- ;
-
- where SIGNAL is one of:
-
- NoViableAlt -- None of the alternatives were predicted
- NoSemViableAlt -- See below
- MismatchedToken -- Didn't find the token we were looking for
-
- You can also use a "default :" clause, which matches any signal, in
- your exception group.
-
- Currently, you cannot define your own signals.
-
- You can define multiple signals for a single action. E.g.,
-
- exception
- catch MismatchedToken :
- catch NoViableAlt :
- catch NoSemViableAlt :
- <<
- printf("stat:caught predefined signal\n");
- consumeUntil(DIE_set);
- >>
-
- If a label (attached to a rule reference) is specified for an
- exception group, that group may be specified after the end of the ";"
- rule terminator. ANTLR can still uniquely identify which rule
- reference to associate the group with; it often makes a rule cleaner
- to have most of the exception handlers at the end of the rule. For
- example,
-
- a : A t:expr B
- | ...
- ;
- exception[t]
- catch ...
- catch ...
-
- The NoViableAlt signal only makes sense for exception groups with
- labels and for rule exception groups.
-
-
- Exception Handler Order of Execution
-
- Given a signal, S, the handler that is invoked is determined by
- looking through the list of enabled handlers in a specific order.
-
- Loosely speaking, we say that a handler is "enabled" (becomes active)
- and pushed on an exception stack when it has been seen by the parser
- on its way down the parse tree. A handler is "disabled" and taken off
- the exception stack when the associated grammar fragment is
- successfully parsed. The formal rules for enabling are:
-
- o All global/default handlers are enabled upon parser entry.
-
- o Exception handlers specified after an alternative become enabled
- when that alternative is predicted.
-
- o Exception handlers specified for a rule become enabled when the
- rule is invoked.
-
- o Exception handlers tied to a particular rule reference within
- an alternative are enabled just before the invocation of that
- rule reference.
-
- Disabling rules are:
-
- o All global/default handlers are disabled upon parser exit.
-
- o Exception handlers specified after an alternative are disabled
- when that alternative has been parsed completely (successfully).
-
- o Exception handlers specified for a rule become disabled just
- before the rule returns.
-
- o Exception handlers tied to a particular rule reference within
- an alternative are disabled just after the return from that
- rule reference.
-
- Upon an error condition, the parser with throw an exception signal, S;
- [in the future, the user will be able to throw user-defined
- exceptions]. Starting at the top of the stack, each exception group
- is examined looking for a handler for S. The first S handler found on
- the stack is executed. In practice, the run time stack and hardware
- program counter are used to search for the appropriated handler. This
- amounts to the following:
-
- o If there is an exception specified for the enclosing alternative,
- then look for S in that group first.
-
- o If there is no exception for that alternative or that group did not
- specify an S handler, then look for S in the enclosing rule's
- exception group.
-
- o Global/default handlers are like macros that are inserted into
- the rule exception group for each rule.
-
- o If there is no rule exception or that group did not specify
- an S handler, then return from the enclosing rule with the current
- error signal still set to S.
-
- o If there is an exception group attached (via label) to the rule
- that just returned, check that exception group for S.
-
- o If an exception group attached to a rule reference does not have
- an S handler, then look for S in the enclosing rule's exception
- group.
-
- This process continues until an S handler is found or a return
- instruction is executed in starting rule.
-
- These guidelines are best described with an example:
-
- a : A c B
- exception /* 1 */
- catch MismatchedToken : <<ACTION1>>
- | C t:d D
- exception /* 2 */
- catch MismatchedToken : <<ACTION2>>
- catch NoViableAlt : <<ACTION3>>
- ;
- exception[t] /* 3 */
- catch NoViableAlt : <<ACTION4>>
- exception /* 4 */
- catch NoViableAlt : <<ACTION5>>
-
- c : E
- ;
-
- d : e
- ;
-
- e : F
- | G
- ;
- exception /* 5 */
- catch MismatchedToken : <<ACTION6>>
-
- The following table summarizes the sequence in which the exception
- groups are tested:
-
- INPUT EXCEPTION GROUP TESTING SEQUENCE ACTION EXECUTED
- D E B 4 ACTION5
- A E D 1 ACTION1
- A F B 1 ACTION6
- A F B 1 ACTION6
- C F B 2 ACTION2
- C E D 5,2 ACTION3
-
- The global/default handlers are like macro insertions. For example:
-
- exception
- catch NoViableAlt : <<blah blah>>
-
- a : A
- ;
- exception
- catch MismatchedToken : <<ack;>>
-
- b : B
- ;
-
- This is functionally equivalent to:
-
- a : A
- ;
- exception
- catch MismatchedToken : <<ack;>>
- catch NoViableAlt : <<blah blah>>
-
- b : B
- ;
- exception
- catch NoViableAlt : <<blah blah>>
-
- [You will soon be able to throw the same or a different signal from an
- exception handler. This makes sense when you want a "panic" signal to
- propagate back up the parse tree so that all rules can clean up.]
-
-
- Modifications to Code Generation
-
- The following generic changes were made to facilitate exception
- handling:
-
- o Each rule reference now has a signal parameter which returns 0 if no
- error occurred during that reference or it returns a nonzero signal S.
-
- o The MATCH() macro has been modified to signal Mismatched token
- rather than calling zzsyn().
-
- o When no viable alternative is found, NoViableAlt is signaled
- rather than calling the zzsyn() routine.
-
- o The parser no longer resynchronizes automatically. See below.
-
-
- Semantic Predicates and NoSemViableAlt
-
- When the input stream does not predict any of the alternatives in the
- current list of possible alternatives, NoViableAlt is signaled.
- However, what happens when semantic predicates are specified in that
- alternative list? It turns out that the following simple strategy
- handles this case:
-
- ``If NoViableAlt is signaled and at least one semantic predicate (for
- a syntactically viable alternative) failed, signal NoSemViableAlt
- instead of NoViableAlt.''
-
- It would be very misleading to just throw NoViableAlt when in fact one
- or more alternatives were syntactically viable; i.e., the reason that
- no alternative was predicted was due to a semantic invalidity--a
- different signal must be thrown. For example,
-
- expr : <<P1>>? ID ... /* function call */
- | <<P2>>? ID ... /* array reference */
- | INT
- ;
- exception
- catch NoViableAlt : <<no ID or INT was found>>
- catch NoSemViableAlt : <<an ID was found, but it was not valid>>
-
- Typically, the programmer would want to give very different error
- messages for the two different situations. Specifically, giving a
- message such as "syntax error at ID missing { ID INT }" would be very
- misleading (i.e., wrong).
-
- In future versions, we may enhance this to throw a different signal
- depending on which predicate failed.
-
- Semantic predicates that are not used to predict alternatives do not
- yet throw signals. You must continue to use the fail-action for that
- predicate.
-
-
- Resynchronizing the Parser
-
- When an error occurs while parsing rule R, the parser will generally
- not be able to continue parsing anywhere within that rule--it will
- return immediately after executing any exception code. The one
- exception is for exceptions with labels. In this case, the parser
- knows exactly where in the alternative you would like to continue
- parsing from.
-
- To allow manual resynchronization of the parser, you may use two new
- functions:
-
- zzconsumeUntil(X_set)
- Consume tokens until you see a token in the token class X. ANTLR
- now generates a packed bitset called X_set for each token class X.
-
- zzconsumeUntilToken(T)
- Consume tokens until you see token T.
-
- For example,
-
- #tokclass RESYNCH { A C }
-
- a : b A
- | b C
- ;
-
- b : B
- exception
- catch MismatchedToken : // consume until FOLLOW(b)
- <<print error message; zzconsumeUntil(RESYNCH_set);>>
- ;
-
- Soon, you will be able to reference the FOLLOW(R) and FIRST(R) for a
- rule R to make resynchronization set construction easier.
-
- You may also use zzset_el() to test for membership in a token class.
-
- int zzset_el(unsigned b, SetWordType *p)
-
- For example,
-
- <<if ( zzset_el(LA(1), X_set) ) blah blah blah;>>
-
- In C++ mode, simply remove the "zz" prefix from the function names.
-
-
- Example
-
- The following grammar demonstrates the use of parser exception
- handling using the C interface (testcpp/13/test.g provides a C++
- example). Given input:
-
- if a+ then a=b+b;
-
- the output should be:
-
- invalid conditional in 'if' statement
- found assignment to a
-
- ----------------------------------------------------------------
- #header <<#include "charbuf.h">>
-
- <<
- main()
- {
- ANTLR(rule(), stdin);
- }
- >>
-
- #token "[\ \t]+" <<zzskip();>>
- #token "\n" <<zzskip(); zzline++;>>
- #token THEN "then"
- #tokclass DIE { "@" "if" ID "else" }
-
- rule: ( stat )+
- ;
-
- stat: "if" t:expr "then" stat { "else" stat }
- | u:ID "=" expr ";"
- <<printf("found assignment to %s\n", $u.text);>>
- ;
- exception[t]
- default :
- <<
- printf("invalid conditional in 'if' statement\n");
- zzconsumeUntilToken(THEN);
- >>
- exception
- catch MismatchedToken :
- catch NoViableAlt :
- catch NoSemViableAlt :
- <<
- printf("stat:caught predefined signal\n");
- zzconsumeUntil(DIE_set);
- >>
-
- expr: expr1 ("\+" expr1)*
- ;
-
- expr1
- : expr2 ("\*" expr2)*
- ;
-
- expr2: ID
- ;
-
- #token ID "[a-z]+"
- ----------------------------------------------------------------
-
- C. LEXCLASS SAVE/RESTORE
-
- [In the words of Ken Weinert:]
-
- This section describes the addition of the mode stack operations
- defined in Tom Moog's NOTES.newbie document. By defining a variable
- USER_ZZMODE_STACK at compile time, you can include these functions for
- use in your grammars.
-
- By default, the #define ZZSTACK_MAX_MODE is set to 32 which gives a
- mode stack of depth 64. If this is too shallow, defining
- ZZSTACK_MAX_MODE to a different value in the makefile will override
- the default.
-
- Four functions are defined with this modification: zzmpush(),
- zzmpop(), zzsave_mode_stack(), and zzrestore_mode_stack().
-
- zzmpush(int newMode)
- push the current mode on the stack and make newMode the current mode.
-
- zzmpop()
- make current the previous mode
-
- zzsave_mode_stack(int modeStack[], int *modeLevel)
- save the current mode stack and depth into indicated variables.
- Analogous to zzsave_[antlr|dlg]_state().
-
- zzrestore_mode_stack(int modeStack[], int *modeLevel)
- restore the mode stack and depth from indicated variables.
- Analogous to zzrestore_[antlr|dlg]_state().
-
- II. CHANGES AND BUG FIXES
-
- o Tokens may now be labeled rather than using the $i interface. The
- two are mutually exclusive in the sense that using $i prevents usage
- of $label.
-
- o Added parser exception handling.
-
- o Added #ifdef gate to tokens.h and class definition file for C/C++
- mode.
-
- o Made the #tokdefs file reader ignore gates on the head of files.
-
- o You can no longer pass init-actions to blocks like this (A|B)[action].
-
- o There was a bug in the code generation for (...)* loops for k>1.
- It used a linear approximate lookahead decision instead of full LL(k).
- It also used the wrong prediction expression for exiting the loop.
-
- o Token class X now results in C/C++ bitset X_set rather than zzerr%d.
-
- o ANTLR now allows "@" in a token class.
-
- o Added consumeUntil and consumeUntilToken functions for resynching
- parser in exception handlers.
-
- o Added testcpp/13 to test exception handling.
-
- o A warning is now generated if you reference a non-token element
- with $i for some integer i in C++ mode.
-
- o There was a bug in the code generation for (...)+ loops. It
- did not put predicates into "do while"--just at front if-gate.
- E.g.,
-
- a : ( b )+ ;
- b : <<pred>>? A;
-
- did not work.
-
- o For 16 bit machines, the ANTLRTokenBuffer file was using some
- implementation-dependent pointer manipulation. This has been
- changed.
-
- o Semantic predicates and -gl option did not work because ANTLR
- tried to insert line information at the end of a line rather
- that at the beginning where the C/C++ compiler wants it.
-
- o Added "const" to the arguments of the zzerr pointer and zzerrstd().
- If you have defined a dlg error routine, you must change the definition
- accordingly.
-
-
- III. A FEW KNOWN BUGS
-
- o I would like to make an auto test script for the C++ samples to
- check their input/output.
-
- o I haven't figured out how to handle deallocation of token objects
- created. Sometimes the programmer has allocated them in a totally
- different application and doesn't want ANTLR to track and delete
- them.
-
- o FOLLOW(rule) and FIRST(rule) capabilities need to be added to the
- tokclass declaration.
-
- o No warning is given for nonstandard signals used in parser exception
- handling.
-
- o No warning is given if $label is not defined.
-
- o Nothing has been done about the failure of semantic predicates
- with regards to exception handling.
-
- o The demand lookahead in C++ mode is still wacked.
-
- o Can't do multiple return values in C++ still.
-
- o No warning is given if all alts of a block are guesses: (...)?.
-
-
- IV. ACKNOWLEDGEMENTS
-
- Thanks to all you folks who tested 1.30 before its release. Thanks to
- the folks at the First Annual PCCTS workshop (held at NeXT July 1994)
- who helped refine the syntax for the parser exception handling.
-
- Thanks goes to the folks at NeXT for prodding us about good error
- recovery.
-